Skip to main content

Observables and Operators

In quantum mechanics, we are interested in the properties of particles, such as their position, momentum, energy, and so on. In the previous page, we dealt with the abstract state of a particle, represented by a state vector . Now, we will discuss how we can measure these properties and how they are represented mathematically.

Table of Contents

Observables

In quantum mechanics, an observable is a physical quantity that can be measured. These are properties of a particle that we can observe, such as position , momentum , energy , and so on. Observables are repesented as linear operators acting on the state vector of the particle. These oeprators are denoted by a hat, such as , , , and so on.

Suppose we have an observable for the energy of a particle. How can we find out which energies are possible for the particle? These determine the basis of the space in which the state vector of the particle lies.

When we measure the energy, suppose we get a value . At that moment, the state vector of the particle collapses to the state vector . These states are called definite states because they represent a definite value of the observable, not a superposition of values.

For all energies, then, each energy corresponds to a definite state . It turns out that is an eigenvalue of the operator , and is the corresponding eigenvector, known as an eigenstate.

Properties of Observables

First, observables have real eigenvalues. This is to prevent observables (like energy) from being imaginary, which is not physically meaningful.

Second, eigenvectors must span the entire space. This ensures that any state vector can be expressed as a linear combination of the eigenvectors. If the eigenvectors do not span the space, then there are states that cannot be represented by the observable. But you cannot have a particle with "none" energy or "none" momentum; it must have some value. Hence, the eigenvectors must span the space.

Third, eigenvectors must be orthogonal. This ensures that the eigenvectors are independent of each other. If not, then one eigenvector can be expressed as a linear combination of the others, which means that the observable is a superposition, not a definite state, contradicting our principles.

In the future, we will show that these properties make observables something known as Hermitian operators.

Born Rule

When we measure an observable, we get a definite value, not a superposition of values. The probability of getting a particular value is given by the Born rule. It is technically a postulate of quantum mechanics (a postulate is a statement that is assumed to be true without proof), but we can see how it can be derived from physical intuitions. (We will take a similar approach when discussing the Schrödinger equation, which is another postulate, but still has a physical basis. After all, in order to come up with postulates, physicists can't just make up random rules; they must be based on experimental observations.)

Suppose we have a state vector , and for simplicity, consider a two-dimensional vector space. This means that the state vector can be expressed as a linear combination of two basis vectors and :

Assuming and are orthonormal, we can see them on a plane. One intuition is that for a state vector , the probability of measuring is higher if the vector is closer to .

To get the component of along , we take the inner product of the two vectors:

One guess is that the probability of measuring is the magnitude of the component :

But this is not quite correct. Another guess would be that the probability is the square of the magnitude of the component. When we drag the state vector along a circle, the sum of squares of the components is always . This would make sense because the sum of probabilities of all possible outcomes must be . Of course, this doesn't actually prove the Born rule, but it gives us an intuition behind it. Instead, let's see why the guess is incorrect.

Flaws in the Guess

Let's lay down what we know so far, and what we need from the probability :

  1. The state vector can be expressed as a linear combination of the basis vectors and :

  2. The probability of measuring is guessed to be:

  3. All probabilities must sum to :

The problem with the guess is that it fails the sum condition under a change of coordinates. Suppose we have the following setup:

In this setup, we have an orthonormal basis and representing the energies of the system. The state vector is a linear combination of these basis vectors:

The coefficients correctly sum to .

Next, let and be a new basis that represents the angular momentum of the system:

Under this new basis, the state vector is now a linear combination of and :

These coefficients do not sum to . Contradiction! We would need to rescale under the new basis to make the coefficients sum to . But this means that depending on the quantity we measure, the state vector changes, which is not physically meaningful.

(Once again, using the square of the magnitudes of the components would work because the norm of the state vector is invariant under a change of basis.)

How do we Solve this?

We have shown that using the magnitudes of the components as probabilities does not work because the sum of probabilities does not remain under a change of basis. Suppose instead of the magnitude, we use an arbitrary function of the components as the probability:

Our goal is to find a function such that the sum of probabilities remains under a change of basis. Furthermore, it should keep the norm of the state vector invariant. Mathematically, we need to find a function such that:

Focusing on the second equation, let's zoom in on one coefficient .

In order for the sum to equal , must not be larger than . Hence, .

If is , then all other coefficients must be . If it is not and is instead some value , then we can move it to the other side to get:

This means that the sum of squares of the other coefficients must be and we can apply the same logic to the next coefficient . must be within the range . Once again, if it is not at the edge (say, ), then we can move it again and apply the same logic to the next coefficient:

The key is that does not have a choice; it must be . All this is to say that the components are independent of each other, with only their range being constrained by the other components. This means that changing one component does not affect the others:

Another key intuition is that the probability of measuring should not depend on whether points in the positive or negative direction of . In other words:

This means that the function must be symmetric about the origin, AKA an even function.

Going back to the first equation (), we can isolate the last coefficient because it behaves differently from the others:

Recall that must take on the value of . Plugging it into the equation, the sign does not matter because is an even function:

Next, take the derivative of both sides with respect to :

In the first term, because is independent of the other components, the only nonzero derivative is when :

Applying the chain rule to the second term:

Rearranging to isolate yields:

Of course, this is a general result for any component . We can write the same equation for as follows:

Since both equations have equal right-hand sides, the left-hand sides must be equal as well:

This is similar to how separation of variables works when solving a partial differential equation. We have an equation of the form . The key is that the left-hand side depends only on and the right-hand side depends only on . If changed with , then if we let change but keep constant, the left-hand side would change but the right-hand side would not. This would violate the equality, so must be a constant.

More explicitly, we can see this by differentiating both sides of with respect to :

This means that the derivative of the left-hand side with respect to is zero, which means that the left-hand side is a constant.

Since this applies to any , we can rewrite the equation as:

Rearranging:

Finally, integrating both sides gives us the function :

The important takeaway is that the probability function must be a quadratic function of the components. In the next section, we find the constants and by using some properties of that need to be satisfied.

Solving for the Constants

First, consider what should be. In other words, what is the probability of measuring when the state vector has no component along ? It should intuitively be .

Plugging this into our function ():

Thus .

Next, recall that the sum of probabilities must be (equation ):

Also remember that the sum of squares of the components must be . This means that if , then all other components must be . Applying this to the sum of probabilities:

We have already shown that . Thus must be . Plugging this into the function:

This means that . We have now found the exact expression for the probability function . Generalizing to complex numbers (using the magnitude of the components):

By convention, we take , meaning that the magnitude of the state vector is . Hence, the probability of measuring is the square of the magnitude of the component :

Finally, the coefficient can be found by taking the inner product of the state vector with the eigenvector. Plugging this into the equation gives us the Born rule:

Born Rule: The probability of measuring an observable with eigenvalue is:

The above derivation is a heuristic way to understand the Born rule. Gleason's theorem is a mathematical result that shows that the Born rule is the only way to assign probabilities to the outcomes of a quantum measurement.

Hermitian Adjoint and Hermitian Operators

Suppose we have an inner product , then suppose we have an operator acting on the state vector . The inner product is then .

Recall that there is a fundamental dual correspondence between vectors and linear functionals, or kets and bras. The question is, is there a correspondence between the operator and some other operator? In other words, is there another operator such that ?

This other operator, , is known as the Hermitian adjoint (or simply adjoint) of .

Properties of Hermitian Adjoint

  1. Applying the Hermitian adjoint twice gives the original operator:

    To prove this, consider the inner product . As we know, this is equal to . We can swap the two vectors (and take the complex conjugate) to get . Then, by the definition of the Hermitian adjoint again, this is equal to . Finally, applying the conjugate symmetry again, it equals . This must equal our original expression of , so .

  2. The adjoint of a sum of operators is the sum of the adjoints:

    To prove this, consider the inner product :

    • By linearity of the inner product, this is equal to . Then, applying the Hermitian adjoint to each term yields . Putting the terms back together, this is equal to .
    • By the definition of the Hermitian adjoint, this is also equal to .

    Both expressions are equal, so .

  3. The adjoint of a product of operators is the product of the adjoints in reverse order (I will call this the product rule):

    To prove this, consider the inner product :

    • By the definition of the Hermitian adjoint, this is equal to .
    • But we can also apply it to each operator separately: .

    Both expressions are equal, so .

  4. The adjoint of a scalar is its complex conjugate:

    This is also quite easy to prove:

    • By the linearity of the right-side and the conjugate-symmetry of the inner product: .
    • By the definition of the Hermitian adjoint, this is equal to .

    Thus .

  5. The adjoint of an operator "flips" its input and output spaces. This is a bit more abstract, but it is a key property of the Hermitian adjoint. Another way to put it is: if an operator is defined to be , then its adjoint is defined to be .

    To see why this is the case, consider the inner product , and suppose and . In order for the inner product to be defined, both vectors must be in the same space. Since has to be in , then must be in as well. Thus, must act on , an element in to produce a vector in . Thus is defined as .

    Next, using the definition of the Hermitian adjoint, . Now, must be in because is in . Thus, must act on to produce a vector in .

    This is why the adjoint of an operator "flips" its input and output spaces.

More generally, the Hermitian adjoint is the operator that is used for the dual correspondence. We shall now consider applying the Hermitian adjoint to a ket vector. But wait - adjoints are applied onto operators, not vectors. What does it mean to take the adjoint of a vector?

For now, we can ignore this issue and just apply the adjoint to a vector anyway. To see what this yields, consider taking the adjoint of an inner product: .

  • Since the inner product is a scalar, its adjoint is just its complex conjugate: . And since the inner product is conjugate symmetric, this is just .
  • By the product rule of the adjoint, we can swap the order of the vectors and take the adjoint of each: .

Thus we have . Since this must hold for all vectors and , we can conclude that the adjoint of a ket vector is the corresponding bra vector:

Of course, this is a bit hand-wavy, but it gives us an intuition behind why the adjoint of a vector is its corresponding bra. The appendix contains a more rigorous proof of this fact.

Observables and Hermitian Operators

Recall that an observable is represented by an operator . Its eigenvectors form a basis representing the possible outcomes of the observable, and its corresponding eigenvalues are the values of the observable. Recall the three properties of observables that we discussed earlier:

  1. The eigenvalues of an observable are real (since they represent measurable quantities). In other words, .
  2. The eigenvectors of an observable form a complete basis (since it should be possible to measure any value of the observable). In other words, .
  3. The eigenvectors of an observable are orthogonal (since they represent distinct outcomes). In other words, , where is the Kronecker delta.

Imagine an observable acting on a state vector . Since has an orthonormal eigenbasis, we can expand in terms of this basis:

Since is an eigenvector of , acting on it just scales it by the eigenvalue :

Next, the value for can be found by taking the inner product of the state vector with the eigenvector:

Rearranging and using the properties of Dirac notation yields:

This means that the operator can be written as:

In the continuous case, such as the eigenbasis of the position operator, the sum becomes an integral:

For both cases, we can see what happens when we take the Hermitian adjoint of the operator. First, since the adjoint of a sum is the sum of adjoints:

By the product rule of the adjoint, this is equal to:

Since the adjoint of a bra/ket is the corresponding ket/bra, and the adjoint of a scalar is its complex conjugate, this simplifies to:

Since eigenvalues must be real, the complex conjugate of is just . Plugging this back into the expression for the adjoint, we notice that this is the same as the original operator :

Operators that are equal to their Hermitian adjoints are known as Hermitian operators.

The term deserves more attention. It is known as the outer product, and unlike the inner product which is just a scalar, the outer product is an operator. For a general case , it acts on a state vector by first taking the inner product of with to get a scalar, and then scaling by this scalar. Hence, overall it outputs a state vector.

Because acts by finding the component of a state vector along and then scaling by this component, it is known as the projector onto the eigenvector . It is sometimes denoted as (e.g. in Sakurai).

Expectation Value of an Observable

The expectation value of an observable in a state is the average value of the observable when measured many times. It is denoted as .

In order to compute it, recall that the expectation value for any probability density function is given by:

In quantum mechanics, the expectation value of an observable is given by:

It is technically a postulate, but we can see why this is the case. We will do it in a continuous case, but it also applies to the discrete case.

Recall the completeness relation for the eigenvectors of a continuous observable:

We can freely insert this relation into any expression. Let's put it into the right-hand side of the expectation value:

Next, we can insert another completeness relation for the left-hand side (). To avoid confusion with variables, we will use as the variable of integration:

Now, we have a term. Recall that is an eigenvector of the position operator, so this is just a scaled vector :

There is a delta function in the middle, . This is only nonzero when , so we can replace with everywhere and integrate over :

By conjugate symmetry, . We can then combine both inner products to get the square magnitude (since ):

Recall from the Born rule that is the probability density function for the position operator. Thus, the expectation value of the position operator does indeed match the definition of the expectation value for a probability density function:

Summary and Next Steps

In this chapter, we have introduced the concept of observables in quantum mechanics. Observables are quantities that can be measured, and they are represented by Hermitian operators.

Here are the key points to remember:

  • Observables are quantities that can be measured in quantum mechanics. They correspond to "real-life" or physical quantities.

  • Their eigenvectors form a basis representing the possible outcomes of the observable, and their eigenvalues are the values of the observable.

  • After making a measurement, the state vector collapses to the eigenvector corresponding to the measured eigenvalue.

  • By intuition, observables have the following three properties:

    1. The eigenvalues of an observable are real numbers.
    2. The eigenvectors of an observable form a complete basis.
    3. The eigenvectors of an observable are orthogonal.
  • The operators can be written as a sum of their eigenvectors and eigenvalues:

    In the continuous case, this becomes an integral:

  • The operator is known as the projector onto the eigenvector . It is sometimes denoted as .

  • The Born rule states that the probability of measuring an observable with eigenvalue is . It followed intuitively from a few key insights about the nature of observables and the conditions for probabilities. Namely, they must sum to in all bases, and they must preserve the norm of the state vector in all bases.

  • The adjoint of an operator is the operator that is used for the dual correspondence between vectors and linear functionals. The adjoint of is denoted as , and is defined such that .

  • Operators that are equal to their Hermitian adjoints are known as Hermitian operators.

  • Hermitian adjoints have the following properties:

    1. Applying the Hermitian adjoint twice gives the original operator.
    2. The adjoint of a sum of operators is the sum of the adjoints.
    3. The adjoint of a product of operators is the product of the adjoints in reverse order.
    4. The adjoint of a scalar is its complex conjugate.
    5. The adjoint of an operator "flips" its input and output spaces.
  • Operators representing observables are Hermitian operators.

  • The expectation value of an observable in a state is the average value of the observable when measured many times. It is denoted as and is given by .

In the next chapter, we will take a brief detour to discuss Poisson brackets in classical mechanics, and then link them to commutators in quantum mechanics, from which we will derive the uncertainty principle. Afterwards, we will discuss how operators can be represented as matrices, and how they can be diagonalized to find their eigenvectors and eigenvalues.

References

  • Quantum Sense, "Maths of Quantum Mechanics", a Youtube Playlist.
  • J.J. Sakurai, "Modern Quantum Mechanics", sections 1.2-1.4.
  • This post on Math Stack Exchange.

Appendix: Review of Eigenvalues and Eigenvectors

An eigenvector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The scalar factor is known as the eigenvalue corresponding to that eigenvector.

Let be a linear transformation. If it has an eigenvector with eigenvalue , then:

Rearranging, we get:

where is the identity matrix. In order for this equation to hold, the matrix must be singular, which means its determinant is zero. Intuitively, a zero determinant means that the matrix squishes space down to a lower dimension, which is why the vector goes to zero. Thus:

This equation is known as the characteristic equation of the matrix .

There is a shortcut to finding the eigenvalues of a matrix. If is half the trace of the matrix (the sum of the diagonal elements) and is the determinant of the matrix, then the eigenvalues are:

This comes from the fact that the trace is the sum of the eigenvalues and the determinant is the product of the eigenvalues (one can see this geometrically by considering the area of the parallelogram formed by the eigenvectors).

Appendix: Why is the Hermitian Adjoint of a Ket its Corresponding Bra?

This proof is a bit more abstract, and it comes from this post on Math Stack Exchange. It borrows idaes from functional analysis and the dual correspondence between vectors and linear functionals.

Recall that Hermitian adjoints act on operators, not vectors. It would thus seem like we need to somehow interpret a vector as a linear transformation. This comes from our dual correspondence, where a vector can be thought of as a linear functional. In fact, let's define this formally - for a vector , we can write its corresponding linear functional as . is a linear functional that takes a scalar and outputs a vector. The simplest form of is just to have it scale the vector by the input scalar:

With this definition, is known as the Canonical isomorphism between a Hilbert space and its dual space .

Now that we have an idea of what it means to interpret a vector as a linear transformation, we can see what it means to take the adjoint of said transformation. Since this transformation is , its adjoint is (by the 5th property of the Hermitian adjoint). We already know that linear functionals are also , so it seems like the adjoint of a vector is a bra (linear functional). But is it exactly the bra , the one that corresponds to ? We can see this by taking the product of the adjoint of a vector with another vector (which should output a scalar):

Since is also an inner product space, it is the same thing as , where I added a subscript to the inner product to denote that it is the inner product in . By the definition of the Hermitian adjoint, this is equal to (notice that the inner product changed from to ). But looking at the left-hand side of the inner product, recall that applying the adjoint of a vector to a scalar just scales the vector by the scalar. So all this is equal to , which is precisely the inner product of and . Thus, , and so .